11 research outputs found

    Tweets as data: Demonstration of TweeQL and TwitInfo

    Get PDF
    Microblogs such as Twitter are a tremendous repository of user-generated content. Increasingly, we see tweets used as data sources for novel applications such as disaster mapping, brand sentiment analysis, and real-time visualizations. In each scenario, the workflow for processing tweets is ad-hoc, and a lot of unnecessary work goes into repeating common data processing patterns. We introduce TweeQL, a stream query processing language that presents a SQL-like query interface for unstructured tweets to generate structured data for downstream applications. We have built several tools on top of TweeQL, most notably TwitInfo, an event timeline generation and exploration interface that summarizes events as they are discussed on Twitter. Our demonstration will allow the audience to interact with both TweeQL and TwitInfo to convey the value of data embedded in tweets

    Processing and visualizing the data in tweets

    Get PDF
    Microblogs such as Twitter provide a valuable stream of diverse user-generated data. While the data extracted from Twitter is generally timely and accurate, the process by which developers extract structured data from the tweet stream is ad-hoc and requires reimplementation of common data manipulation primitives. In this paper, we present two systems for querying and extracting structure from Twitter-embedded data. The first, TweeQL, provides a streaming SQL-like interface to the Twitter API, making common tweet processing tasks simpler. The second, TwitInfo, shows how end-users can interact with and understand aggregated data from the tweet stream, in addition to showcasing the power of the TweeQL language. Together these systems show the richness of content that can be extracted from Twitter

    TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration

    Get PDF
    Microblogs are a tremendous repository of user-generated content about world events. However, for people trying to understand events by querying services like Twitter, a chronological log of posts makes it very difficult to get a detailed understanding of an event. In this paper, we present TwitInfo, a system for visualizing and summarizing events on Twitter. TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity. A novel streaming algorithm automatically discovers these peaks and labels them meaningfully using text from the tweets. Users can drill down to subevents, and explore further via geolocation, sentiment, and popular URLs. We contribute a recall-normalized aggregate sentiment visualization to produce more honest sentiment overviews. An evaluation of the system revealed that users were able to reconstruct meaningful summaries of events in a small amount of time. An interview with a Pulitzer Prize-winning journalist suggested that the system would be especially useful for understanding a long-running event and for identifying eyewitnesses. Quantitatively, our system can identify 80-100% of manually labeled peaks, facilitating a relatively complete view of each event studied

    Modeling tax evasion with genetic algorithms

    Get PDF
    The U.S. tax gap is estimated to exceed $450 billion, most of which arises from non-compliance on the part of individual taxpayers (GAO 2012; IRS 2006). Much is hidden in innovative tax shelters combining multiple business structures such as partnerships, trusts, and S-corporations into complex transaction networks designed to reduce and obscure the true tax liabilities of their individual shareholders. One known gambit employed by these shelters is to offset real gains in one part of a portfolio by creating artificial capital losses elsewhere through the mechanism of “inflated basis” (TaxAnalysts 2005), a process made easier by the relatively flexible set of rules surrounding “pass-through” entities such as partnerships (IRS 2009). The ability to anticipate the likely forms of emerging evasion schemes would help auditors develop more efficient methods of reducing the tax gap. To this end, we have developed a prototype evolutionary algorithm designed to generate potential schemes of the inflated basis type described above. The algorithm takes as inputs a collection of asset types and tax entities, together with a rule-set governing asset exchanges between these entities. The schemes produced by the algorithm consist of sequences of transactions within an ownership network of tax entities. Schemes are ranked according to a “fitness function” (Goldberg in Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Boston, 1989); the very best schemes are those that afford the highest reduction in tax liability while incurring the lowest expected penalty.Mitre Corporation (Innovation Program

    Global burden of 288 causes of death and life expectancy decomposition in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021

    Get PDF
    BACKGROUND Regular, detailed reporting on population health by underlying cause of death is fundamental for public health decision making. Cause-specific estimates of mortality and the subsequent effects on life expectancy worldwide are valuable metrics to gauge progress in reducing mortality rates. These estimates are particularly important following large-scale mortality spikes, such as the COVID-19 pandemic. When systematically analysed, mortality rates and life expectancy allow comparisons of the consequences of causes of death globally and over time, providing a nuanced understanding of the effect of these causes on global populations. METHODS The Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2021 cause-of-death analysis estimated mortality and years of life lost (YLLs) from 288 causes of death by age-sex-location-year in 204 countries and territories and 811 subnational locations for each year from 1990 until 2021. The analysis used 56 604 data sources, including data from vital registration and verbal autopsy as well as surveys, censuses, surveillance systems, and cancer registries, among others. As with previous GBD rounds, cause-specific death rates for most causes were estimated using the Cause of Death Ensemble model-a modelling tool developed for GBD to assess the out-of-sample predictive validity of different statistical models and covariate permutations and combine those results to produce cause-specific mortality estimates-with alternative strategies adapted to model causes with insufficient data, substantial changes in reporting over the study period, or unusual epidemiology. YLLs were computed as the product of the number of deaths for each cause-age-sex-location-year and the standard life expectancy at each age. As part of the modelling process, uncertainty intervals (UIs) were generated using the 2·5th and 97·5th percentiles from a 1000-draw distribution for each metric. We decomposed life expectancy by cause of death, location, and year to show cause-specific effects on life expectancy from 1990 to 2021. We also used the coefficient of variation and the fraction of population affected by 90% of deaths to highlight concentrations of mortality. Findings are reported in counts and age-standardised rates. Methodological improvements for cause-of-death estimates in GBD 2021 include the expansion of under-5-years age group to include four new age groups, enhanced methods to account for stochastic variation of sparse data, and the inclusion of COVID-19 and other pandemic-related mortality-which includes excess mortality associated with the pandemic, excluding COVID-19, lower respiratory infections, measles, malaria, and pertussis. For this analysis, 199 new country-years of vital registration cause-of-death data, 5 country-years of surveillance data, 21 country-years of verbal autopsy data, and 94 country-years of other data types were added to those used in previous GBD rounds. FINDINGS The leading causes of age-standardised deaths globally were the same in 2019 as they were in 1990; in descending order, these were, ischaemic heart disease, stroke, chronic obstructive pulmonary disease, and lower respiratory infections. In 2021, however, COVID-19 replaced stroke as the second-leading age-standardised cause of death, with 94·0 deaths (95% UI 89·2-100·0) per 100 000 population. The COVID-19 pandemic shifted the rankings of the leading five causes, lowering stroke to the third-leading and chronic obstructive pulmonary disease to the fourth-leading position. In 2021, the highest age-standardised death rates from COVID-19 occurred in sub-Saharan Africa (271·0 deaths [250·1-290·7] per 100 000 population) and Latin America and the Caribbean (195·4 deaths [182·1-211·4] per 100 000 population). The lowest age-standardised death rates from COVID-19 were in the high-income super-region (48·1 deaths [47·4-48·8] per 100 000 population) and southeast Asia, east Asia, and Oceania (23·2 deaths [16·3-37·2] per 100 000 population). Globally, life expectancy steadily improved between 1990 and 2019 for 18 of the 22 investigated causes. Decomposition of global and regional life expectancy showed the positive effect that reductions in deaths from enteric infections, lower respiratory infections, stroke, and neonatal deaths, among others have contributed to improved survival over the study period. However, a net reduction of 1·6 years occurred in global life expectancy between 2019 and 2021, primarily due to increased death rates from COVID-19 and other pandemic-related mortality. Life expectancy was highly variable between super-regions over the study period, with southeast Asia, east Asia, and Oceania gaining 8·3 years (6·7-9·9) overall, while having the smallest reduction in life expectancy due to COVID-19 (0·4 years). The largest reduction in life expectancy due to COVID-19 occurred in Latin America and the Caribbean (3·6 years). Additionally, 53 of the 288 causes of death were highly concentrated in locations with less than 50% of the global population as of 2021, and these causes of death became progressively more concentrated since 1990, when only 44 causes showed this pattern. The concentration phenomenon is discussed heuristically with respect to enteric and lower respiratory infections, malaria, HIV/AIDS, neonatal disorders, tuberculosis, and measles. INTERPRETATION Long-standing gains in life expectancy and reductions in many of the leading causes of death have been disrupted by the COVID-19 pandemic, the adverse effects of which were spread unevenly among populations. Despite the pandemic, there has been continued progress in combatting several notable causes of death, leading to improved global life expectancy over the study period. Each of the seven GBD super-regions showed an overall improvement from 1990 and 2021, obscuring the negative effect in the years of the pandemic. Additionally, our findings regarding regional variation in causes of death driving increases in life expectancy hold clear policy utility. Analyses of shifting mortality trends reveal that several causes, once widespread globally, are now increasingly concentrated geographically. These changes in mortality concentration, alongside further investigation of changing risks, interventions, and relevant policy, present an important opportunity to deepen our understanding of mortality-reduction strategies. Examining patterns in mortality concentration might reveal areas where successful public health interventions have been implemented. Translating these successes to locations where certain causes of death remain entrenched can inform policies that work to improve life expectancy for people everywhere. FUNDING Bill & Melinda Gates Foundation

    Simulating tax evasion using agent based modelling And evolutionary search

    No full text
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.7Cataloged from PDF version of thesis.Includes bibliographical references (page 61).We present a design and model for Simulating Co-Evolution of Tax and Evasion (SCOTE). The system performs agent based modeling of the tax ecosystem and searches for tax evasion strategies using a variant of a Genetic Algorithm with a grammar. Current methodologies and tools to detect, discover or recognize tax evasion are not sufficient. In recent years the tax gap, the aggregate sum of the difference between the tax owed in principle and tax paid in practice was calculated to exceed 450 billion dollars. Numerous tax evasion schemes have surfaced that perform seemingly legal transactions but once observed closely their sole purpose is to reduce tax liability. Moreover, these schemes are evolving with time. Whenever a scheme is detected and eliminated by fixing a loop hole in the tax code, others emerge to replace it and currently there is no systematic way to predict the emergence of these schemes. SCOTE allows us to encode tax evasion strategies into a searchable representation. SCOTE has three major components namely the Genetic Algorithm library(GA), the interpreter and the Parser. The GA encodes transaction plans into an integer representation and performs search over the transaction plans to find a scheme that produces the maximum tax gap. The Parser performs grammatical mapping of list of integers to a transaction plan.The interpreter models the tax ecosystem into a graph where the entities such as taxpayer and partnerships are nodes and the transactions between entities are the edges. Each entity has a portfolio of assets and the values of the assets are updated after a transaction. The interpreter runs a transaction plan generated by GA on the graph to produce the tax gap. We ran two experiments using two of the known tax evasion schemes namely "Son of Boss" and "iBOB" and we were able to detect the two schemes using SCOTE.by Osama Badar.M. Eng

    A study on the effect of bioactive glass and hydroxyapatite-loaded Xanthan dialdehyde-based composite coatings for potential orthopedic applications

    No full text
    Abstract The most important challenge faced in designing orthopedic devices is to control the leaching of ions from the substrate material, and to prevent biofilm formation. Accordingly, the surgical grade stainless steel (316L SS) was electrophoretically deposited with functional composition of biopolymers and bioceramics. The composite coating consisted of: Bioglass (BG), hydroxyapatite (HA), and lawsone, that were loaded into a polymeric matrix of Xanthan Dialdehyde/Chondroitin Sulfate (XDA/CS). The parameters and final composition for electrophoretic deposition were optimized through trial-and-error approach. The composite coating exhibited significant adhesion strength of “4B” (ASTM D3359) with the substrate, suitable wettability of contact angle 48°, and an optimum average surface roughness of 0.32 µm. Thus, promoting proliferation and attachment of bone-forming cells, transcription factors, and proteins. Fourier transformed infrared spectroscopic analysis revealed a strong polymeric network formation between XDA and CS. scanning electron microscopy and energy dispersive X-ray spectroscopy analysis displayed a homogenous surface with invariable dispersion of HA and BG particles. The adhesion, hydrant behavior, and topography of said coatings was optimal to design orthopedic implant devices. The said coatings exhibited a clear inhibition zone of 21.65 mm and 21.04 mm with no bacterial growth against Staphylococcus aureus (S. Aureus) and Escherichia coli (E. Coli) respectively, confirming the antibacterial potential. Furthermore, the crystals related to calcium (Ca) and HA were seen after 28 days of submersion in simulated body fluid. The corrosion current density, of the above-mentioned coating was minimal as compared to the bare 316L SS substrate. The results infer that XDA/CS/BG/HA/lawsone based composite coating can be a candidate to design coatings for orthopedic implant devices

    Effect of Wet Aging on Color Stability, Tenderness, and Sensory Attributes of Longissimus lumborum and Gluteus medius Muscles from Water Buffalo Bulls

    No full text
    The present study aimed to investigate the effect of wet aging on meat quality characteristics of Longissimus lumborum (LL) and Gluteus medius (GM) muscles of buffalo bulls. Meat samples from six aging periods, i.e., 0 day (d) = control, 7 d, 14 d, 21 d, 28 d, and 35 d, were evaluated for pH, color, metmyoglobin content (MetMb%), cooking loss, water holding capacity (WHC), myofibrillar fragmentation index (MFI), Warner–Bratzler shear force (WBSF), and sensory evaluation. The pH, instrumental color redness (a *), yellowness (b *), chroma (C *), and MetMb% values were increased, while the lightness (L *) and hue angle (h *) values showed non-significant (p > 0.05) differences in both LL and GM muscles in all aging periods. The cooking loss increased while WHC decreased till 35 days of aging. MFI values significantly (p < 0.05) increased, while WBSF values decreased; in addition, sensory characteristics were improved with the increase in the aging period. Overall, the color, tenderness, and sensory characteristics were improved in LL and GM muscles until 28 and 21 days of aging, respectively. Based on the evaluated meat characteristics, 28 days of aging is required to improve the meat quality characteristics of LL, whereas 21 days of aging is suitable for GM muscle
    corecore